Automated detection of smiles as discrete episodes

Abstract Background Patients seeking restorative and orthodontic treatment expect an improvement in their smiles and oral health‐related quality of life. Nonetheless, the qualitative and quantitative characteristics of dynamic smiles are yet to be understood. Objective To develop, validate, and introduce open‐access software for automated analysis of smiles in terms of their frequency, genuineness, duration, and intensity. Materials and Methods A software script was developed using the Facial Action Coding System (FACS) and artificial intelligence to assess activations of (1) cheek raiser, a marker of smile genuineness; (2) lip corner puller, a marker of smile intensity; and (3) perioral lip muscles, a marker of lips apart. Thirty study participants were asked to view a series of amusing videos. A full‐face video was recorded using a webcam. The onset and cessation of smile episodes were identified by two examiners trained with FACS coding. A Receiver Operating Characteristic (ROC) curve was then used to assess detection accuracy and optimise thresholding. The videos of participants were then analysed off‐line to automatedly assess the features of smiles. Results The area under the ROC curve for smile detection was 0.94, with a sensitivity of 82.9% and a specificity of 89.7%. The software correctly identified 90.0% of smile episodes. While watching the amusing videos, study participants smiled 1.6 (±0.8) times per minute. Conclusions Features of smiles such as frequency, duration, genuineness, and intensity can be automatedly assessed with an acceptable level of accuracy. The software can be used to investigate the impact of oral conditions and their rehabilitation on smiles.

treatment planning and smile rehabilitation from using static smiles to dynamic smiles; herein lies the 'art of the smile'. 6 As the pursuit for better dentofacial aesthetics increases, it is essential to distinguish between posed and spontaneous smiles, differences between which are significant and can influence treatment planning and smile design. 5 Understanding the characteristics of different smiles and the associated age-related changes in orofacial musculature, for example, is important to the decision-making process to achieve 'ideal' tooth display. 7 However, this process should not be confined to the aesthetic elements alone but should also extend to understand whether an oral rehabilitation treatment, including orthodontics, actually affects the number and the way a patient smiles. 8,9 Smiling that depicts situations of spontaneous pure enjoyment or laughter are often referred to as the genuine 'Duchenne' smiles, to acknowledge the scientist who first described their features. 10,11 The Duchenne smile prompts a combined activation of the zygomaticus major and the orbicularis oculi muscles. This pattern of muscular activity distinguishes between genuine smiles and 'social' smiles, which are generally expressed during conditions of non-enjoyment. 12,13 The identification of Duchenne smiles relies on subtle analysis of facial expressions. 14 The Facial Action Coding System (FACS) 15 is a popular and reliable method for detecting and quantifying the frequency of facial expressions from full-face video recordings. 16 The FACS uses action units (AUs), which code for actions of individual or groups of muscles during facial expression. 15 The activation level of each AU is scored using intensity scores, ranging from 'trace' to 'maximum'. According to FACS, the onset of a smile can be identified when the activation of the zygomaticus major displays traces of raised skin within the lowerto-middle nasolabial area and other traces of upwardly angled and elongated lip corners. 15 These muscle activities would increase in intensity until the smile apex is reached before reverting until no further traces of activation of the zygomaticus major could be recognised; hence, the smile offset is denoted. 15 The introduction of FACS has undoubtedly challenged the study of facial expressions as it allows real-time assessment of emotions; however, its utilisation for manual detection and coding of AUs presents with limitations; (a) the need for experienced coders who are able to accurately identify on a frame-wise basis the onset, apex, and offset of a smile, 16 (b) the coding process is extremely laborious, posing a huge challenge in large-scale research, (c) susceptibility to observer biases 17 and high costs. 18 The observable limitations encountered with manual analyses of smiles has led to computing developments to automatedly detect dynamic smiling features. 19 FACS focuses primarily on the identification of active target AUs frame-by-frame and do not include comprehensive analyses of smiling as discrete episodes, so that their individual features and patterns can be characterised. An episode-wise analysis of individual smiles would allow researchers to address questions such as how often, how long, how strong, and how genuinely do individuals smile under different experimental and/or situational factors, and what is the impact of factors such as, oral health-related conditions, on the way people smile. This would also pave the way to understand the dynamic characteristics of smiles in oral rehabilitation patients 8 and assist in areas where smile rehabilitation through individualised muscle mimicry and training is demanded. 20 The aim of this study is to develop and validate a user-friendly software script, based on well-established pattern-recognition algorithms for tracking facial landmarks and facial AUs, so that discrete smile episodes can be analysed off-line from full-face videos and quantified in terms of frequency, duration, authenticity, and intensity of smile.

| MATERIAL S AND ME THODS
The study included two phases. During the first phase, a software script was developed with the help of a computer scientist (HB) and extensively tested with ongoing feedback from a focus group represented by the authors and a few test volunteers. During the second phase, preliminary data were collected from a convenience sample of thirty study participants to optimise the performance of the algorithm for smiling detection and to identify optimal thresholds, so that the software's performance could be validated against two manual coders.

| Phase I: software script
OpenFace2.2.0 was used as a platform to extract information about facial AUs, which were considered relevant for this study. 21 This is an open-source automatic facial recognition software intended to be used by researchers interested in machine learning, affective computing, and facial behaviour analysis. 21 The software is an update of a previous version of a facial behaviour analysis toolkit, which is based on convolutional neural networks and allows automated identification of 68 facial landmarks at any frame rate. 21 To identify the onset of a smiling episode, both AU6 and AU12 had to be above the specified thresholds. The end of the smiling episode was identified by a subthreshold activation of either AU for longer than 2 s. In effect, this means that when two or more smiling episodes were separated by less than 2 s, they were merged into a single episode. The stand-by time could be changed by the user.
For every smiling episode, the software assigned a progressive count, the onset time, the duration, and the mean activation of AU6 and AU12 across the entire episode. The onset and duration of individual episodes were given at a resolution equal to the inverse of frame rate of the video analysed.
In order to assign a clinically meaningful value to AU25, this was reported as the proportion of time teeth were shown during a given smiling episode. For example, an activity value of 50% indi-

| Phase 2: descriptive study and software validation
Data were collected at the Craniofacial Clinical Research Laboratory at the Faculty of Dentistry, University of Otago, under local Ethics Committee approval number H19/160. All participants enrolled in this study agreed to participate and signed a written informed consent form. The report of phase 2 conforms to the guidelines for reporting observational studies (STROBE). 25  The occlusal characteristics of study participants were assessed using the Dental Aesthetic Index (DAI). DAI is a popular tool used in epidemiology to assess a specific set of occlusal traits, such as missing anterior teeth, crowding and spacing in the incisal region, midline diastema, overjet, anterior open bite, incisor irregularity, and molar relationship. 26 The overall DAI assessment scores of the weighted components are summed with a constant of 13 to produce the final DAI aggregate. F I G U R E 1 Example frame from smiling study participants, with activation of AU6 > 0.5 and AU12 > 1.5

| Experimental Setup
An Ultra High-Definition web camera (Logitech BRIO 4K Ultra High-Definition Webcam), with resolution set to 4096 × 2160 pixels and frame rate set at 30 frames per second was secured atop a 27-inch Dell Ultrasharp U2715H computer monitor with a resolution of 2560 × 1440 pixels used to showcase a video clip.
Each participant was seated 60-70 cm away from the display monitor. The height of the monitor was adjusted so that the participant's eyes were aligned at a point corresponding to the middle of the screen when the participant's head was in natural head position.
Face lighting was individually optimised by a ring light (APEXEL 10″ 26 cm LED Selfie Circle Ring, Apexel), which was also secured on the back of the screen. A neutral background was used to avoid light reflections and object interferences, which could affect off-line analyses of the video. The room light was switched off during the entire recording.

| Smile triggering video
Three amusing video clips were identified via a small pilot study by the focus group previously described. The first clip showed an episode of Mr Bean (Mr Bean Rides Again, Act 5: The Flight; 3 min), whose character is widely used as a trigger stimulus in smile research. 27  inter-task interval. All tasks were administered once, except for smiling, which was repeated three times. These tasks allowed a precise tuning of the machine learning models that were applied to detect smiling episodes in the video and individual-specific calibration of the algorithm.

| Procedure
Each participant's involvement in the study took place in a single session. At the start of this session, each participant was checked against the inclusion/exclusion criteria, and the occlusal characteristics were scored using the DAI index. The participants were then given an overview of the research project and signed the written consents for participation. To elicit natural responses and trigger spontaneous smiling reactions during the video recording, the participants were not told that the main outcomes of the study were the features of their smiles. Afterwards, each participant was left alone in the recording room and was requested to view the video clip and then perform the follow-up tasks.
After viewing the video, each participant was asked to fill in two questionnaires. The first was a 12-item Smile Aesthetics-Related Quality of Life (SERQoL) questionnaire relating to three dimensions of the psychosocial impact of smiles. 29 The second was the 60-item IPIP-NEO-60 personality scale. 30 The results of these questionnaires were the subject of another investigation and are not analysed in this report. Each participant was given a $20 voucher as reimbursement for participation in this project.

| Data analysis and statistics
The full-face videos were reviewed and coded frame-wise by two examiners (HM and RK), who were instructed to identify each distinct smiling episode (i.e. preceded and followed by a smile-free period of at least two seconds) in each study participant. The frames corresponding to the onset and cessation of each smiling episode were identified and noted between the two coders who viewed the full-face videos within the same setting until a consensus between the two coders was reached. When consensus was not reached, a third coder (MF) was consulted.
The validity of the smiling detection software was assessed by calculating receiver operating characteristics (ROC) curves, using the examiner-coded smiles as a reference standard and classification variable. ROC curves were assessed frame-by-frame timewise for each smile and smile-free portions. Sensitivity (Se = true positive rate) and specificity (Sp = true negative rate) were calculated frame-wise and maximised using Youden index (Se +Sp −1). The To obtain an estimate of smile genuineness (0-5), intensity (0-5), and teeth exposure (%), the amount of activation of AU6, AU12, and AU25 were averaged across each episode. The outcome variables considered in this study were the number of smiling episodes per session, the mean and cumulative duration of smiling episodes, and the mean activation of AU6, AU12, and AU25.
All the data were analysed in Excel (Version 16.51, Microsoft Corporation) and SPSS (version 20.0 IBM Corporation).

| RE SULTS
Study participants were young adults, mostly Caucasian (>80%), about half of them were females, and had a broad range of malocclusions ( Table 1).
The distinct smile episodes that were manually identified framewise by coders were used to build a ROC curve (Figure 2 Table 2. Activation of AU12, which is the main AU of the smile, 15 ranged from slight to pronounced, with intensity ranking. Some participants hardly showed teeth on smiling, while others showed teeth throughout the entire smile episode.

| DISCUSS ION
This paper presents a user-friendly automated software script, which can detect and quantify smiles features in terms of their: (1) frequency, (2)  The measure of diagnostic accuracy includes both sensitivity and specificity values. 32 In our study, 82.9% sensitivity value demonstrates a high proportion of detected true smile episodes.
In addition, the diagnostic specificity is confined to 89.7%, as presented in the ROC curve plot. In summation, both values align well with stipulated expectations in the area of automated facial expression recognition and dynamic analysis of human emotions. 33,34 On the other hand, descriptive values from automated analysis of sample clips showed that participants smiled around two times per minute, on average for around 11 s per episode. Also, the mean intensity of the zygomaticus major activation (AU12) was 2.2 ± 0.4. These findings align with previous research of participants who viewed a funny clip with a mean duration of AU12 activation (13.8 ± 12.7 s) and a maximum intensity of 1.8 ± 1.1. 35 Further, a recent study reported that the mean intensity of AU12 was 4.1 during genuine smiling and 3.9 for posed smiles. 36 Though these AU12 intensity values seem comparable during both genuine and posed smiles, previous research pointed out to recognisable differences with respect to AU6 activity during genuine and posed smiles placing an argument that it would be difficult to deliberately fake a genuine smile. 10,37 However, there is also some evidence suggesting that Duchenne smiles are merely traces of smile intensity rather than a reliable and distinct indicator of smile authenticity. 38 Such discrepancies in the reported findings may be ascribed to the different methods to trigger and to measure smiles, the social context, and sociodemographic characteristics of the sample, which may all influence the features of smiling. 39 However, they still do impose limitations on our understanding, recognition and differentiation of genuine and posed smiles.
Smiling is an expression that can be triggered on demand, as well as spontaneously within a social context. In phase 2, the trigger and to increase the external validity of our findings. 46 In addition, it is important to note that the yielded accuracy was not perfect, though an AUC value closer to 1 (0.94 in our report) is viewed as very high in terms of the discrimination performance of the software. 47 Further enhancements are plausible with future improvements in AI. These enhancements should also include other methods of quantification of muscular activity to objectively assess genuine smiles using electromyography (EMG) and wearable detection devices. 48 The possible effect of calibration on smile detection accuracy also needs to be further investigated in differ- In addition, the capability of the algorithm to detect and analyse ob-

CO N FLI C T O F I NTE R E S T
All authors declare that they have no competing interest.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/joor.13378.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data would be available upon reasonable request.